-
Notifications
You must be signed in to change notification settings - Fork 305
Differential Binarization model #2095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took high level pass and left some comments.
Also,
Make al the file names in follow the same format like other files, for db_utils and losses.py
* support flash-attn at torch backend * fix * fix * fix * fix conflit * fix conflit * fix conflit * fix conflit * fix conflit * fix conflit * format
* init: Add initial project structure and files * bug: Small bug related to weight loading in the conversion script * finalizing: Add TIMM preprocessing layer * incorporate reviews: Consolidate stage configurations and improve API consistency * bug: Unexpected argument error in JAX with Keras 3.5 * small addition for the D-FINE to come: No changes to the existing HGNetV2 * D-FINE JIT compile: Remove non-essential conditional statement * refactor: Address reviews and fix some nits
* Register qwen3 presets * fix format
This reverts commit 80b3c56.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added my review comment, in the issue description add the original implementation, reference paper, training colab and end to end working demo of the trained model including post processing.
| if head_kernel_list is None: | ||
| head_kernel_list = [3, 2, 2] | ||
| if image_shape is None: | ||
| image_shape = (640, 640, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this, head_kernel_list you can add it to default argument if it is common for all the implementations, else this can be part of the config as per the specific checkpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added head_kernel_list like this, as gemini-code-reveiw suggested(Using mutable default arguments like lists or tuples is a common pitfall in Python and can lead to unexpected behavior).
I updated this part with config. Next commit able to see the changes.
| )(topdown_p2) | ||
| featuremap_p4 = layers.UpSampling2D((4, 4), dtype=dtype)(featuremap_p4) | ||
| featuremap_p3 = layers.UpSampling2D((2, 2), dtype=dtype)(featuremap_p3) | ||
| featuremap_p2 = layers.UpSampling2D((1, 1), dtype=dtype)(featuremap_p2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address this comment, it can be removed since it doesn't do anything
| run_mixed_precision_check=False, | ||
| run_quantization_check=False, | ||
| run_data_format_check=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enable these tests
| if backend.backend() == "jax": | ||
| pytest.skip( | ||
| "JAX backend does not support this test due to NaN issues." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be you should investigate and make it work on JAX as well.
| The probability map output generated by `predict()` can be translated into | ||
| polygon representation using `model.postprocess_to_polygons()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this postprocess_to_polygons is not implemented, you need to update the docstring accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply the changes with relevant doc string. Next commit able to see the changes
| `map_output` now holds a 8x224x224x3 tensor, where the last dimension | ||
| corresponds to the model's probability map, threshold map and binary map | ||
| outputs. Use `postprocess_to_polygons()` to obtain a polygon | ||
| representation: | ||
| ```python | ||
| detector.postprocess_to_polygons(map_output[...,0]) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
| import keras | ||
|
|
||
|
|
||
| def Polygon(coords): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow snake casing
| image_size=(640, 640), | ||
| annotation_size=(640, 360), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we using these two arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove annotation_size argument which I missed it. As earlier I was using diffbin_utils function with image text data preprocessor. Next commit able to see the changes
|
|
||
|
|
||
| @keras_hub_export("keras_hub.models.ImageTextDetectorPreprocessor") | ||
| class ImageTextDetectorPreprocessor(Preprocessor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any difference between ImageClassifierPreprocessor and this, if it is used only for preprocessing images, may be you can just use ImageClassifierPreprocessor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because earlier I was using diffbin_utils functions(some functions like cv2,shapely created with keras.ops). But this was giving not accurate result as with cv2 and shapely result. So I used this ImageClassifierPreprocessor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we can subclass ImageClassifierPreprocessor, like other vision models
Differential Binarization model.
Notebook: https://colab.sandbox.google.com/gist/mehtamansi29/86289c0ef273772f5de309ce20d0292d/diffbin_training_2.ipynb